Analysis of Covid-19 data using discrete Marshall–Olkinin Length Biased Exponential: Bayesian and frequentist approach

The paper presents a novel statistical approach for analyzing the daily coronavirus case and fatality statistics. The survival discretization method was used to generate a two-parameter discrete distribution. The resulting distribution is referred to as the "Discrete Marshall–Olkin Length Biased Exponential (DMOLBE) distribution". Because of the varied forms of its probability mass and failure rate functions, the DMOLBE distribution is adaptable. We calculated the mean and variance, skewness, kurtosis, dispersion index, hazard and survival functions, and second failure rate function for the suggested distribution. The DI index demonstrates that the proposed model can represent both over-dispersed and under-dispersed data sets. We estimated the parameters of the DMOLBE distribution. The behavior of ML estimates is checked via a comprehensive simulation study. The behavior of Bayesian estimates is checked by generating 10,000 iterations of Markov chain Monte Carlo techniques, plotting the trace, and checking the proposed distribution. From simulation studies, it was observed that the bias and mean square error decreased with an increase in sample size. To show the importance and flexibility of DMOLBE distribution using two data sets about deaths due to coronavirus in China and Pakistan are analyzed. The DMOLBE distribution provides a better fit than some important discrete models namely the discrete Burr-XII, discrete Bilal, discrete Burr-Hatke, discrete Rayleigh distribution, and Poisson distributions. We conclude that the new proposed distribution works well in analyzing these data sets. The data sets used in the paper was collected from 2020 year.

The primary purpose of this study is to introduce a new flexible probability distribution for modelling across over-dispersed data sets. The mathematical properties of the new distribution, such as its simple closed-form expressions for the pmf, cdf, moments, and other characteristics, are obtained. The maximum likelihood approach is used to estimate the model parameters. To suggest a new alternative approach to model over dispersed data sets, the DMOLBE distribution applied to the number of deaths due to Covid-19 data sets. Consequently, the DMOLBE model's primary goals are: • The fact that this distribution provides the several hazard rate forms, such as declining, growing, or increasing-constant, sets it apart from many other one-or two-parameter discrete distributions. Because of these hazard rates, the suggested model can be used to model a variety of data sets. • It provides a variety of PMF shapes suitable for modelling symmetric, positively skewed, or negatively skewed data that may not be successfully modelled by other competitor models. • The introduction of a number of statistical and reliability traits, such as moments, probability functions, reliability indices, hazard functions, reverse hazard rate, second rate of Failure, etc. • In comparison to other discrete distribution models in the literature, analysis results from two practical applications revealed that the DMOLBE distribution matches the supplied data sets satisfactorily; • In the presence of gathered data, maximum likelihood and Bayesian estimation methods are taken into consideration to estimate the specified parameters. • The effectiveness of the acquired estimators is assessed using lengthy Monte Carlo simulations and a variety of accuracy metrics, including mean squared errors and absolute biases. It would seem plausible to suggest that approaches for parameter estimation are adequate and efficient.
The study is divided into the following sections: "Methodology" is based on the mathematical characteristics and derivation of the discrete Marshall-Olkinin Length Biased Exponential distribution. "Parameter estimation" presents maximum likelihood estimation via an extensive simulation study. "Bayesian estimation" discusses the results for all models. Finally, in "Results and discussion", we bring the research to a close.

Methodology
In this section, we introduced a new discrete distribution, derived its statistical properties, estimate the model parameters using the maximum likelihood approach.
The DMOLBE distribution and its properties. Let X be a random variable connected with the Ahsanul-Haq et al. 21 presented Marshall-Olkinin Length Biased Exponential distribution. The MOLBE distribution's probability density function is:

The associated survival function is
The DMOLBE distribution obtained using Eqs. (1) and (3), the pmf of the DMOLBE distribution is given The cdf of DMOLBED is as follows where γ is shape and β is scale parameter. Figure 1 depicts the behavior of the probability mass function of the DMOLBE distribution, which varies with parameter values. The DMOLBE distribution is clearly declining, positively skewed, and symmetric, as seen above. It demonstrates the suggested distribution's versatility in dealing with data of varying behaviour.

Survival and hazard function. The survival function of DMOLBED is as follows
The hazard function (hrf) of DMOLBE is given as follows Figure 2 shows the behavior of hazard function for different values of parameters which is increasing and decreasing which shows the flexibility of the model. The second rate of failure. The second rate of failure of DMOLBE is defined as Reverse hazard rate. The reverse Hazard rate of DMOLBE is defined as: , Probability generating function and moments. Let X be a discrete random variable, then the probability generating function of the DMOLBE distribution is given as follows: www.nature.com/scientificreports/ Differentiating G x (Z) with respect to Z and setting Z = 1 , we can obtain the factorial moments as   The DI indicates whether a distribution is suitable to model over or under-dispersed data sets. If DI > 1 , the certain distribution is showing over-dispersed behavior. It is observed that the DMOLBE distribution shows over-dispersion when γ = 0.5 and different values of parameter β . Conversely, the DMOLBE distribution shows under-dispersion when β = 0.5 and different values of γ .

Parameter estimation
. , x n ) be a random sample of size n from DMOLBE distribution with probability mass function defined as Then the log-likelihood function is given by: www.nature.com/scientificreports/ www.nature.com/scientificreports/ Now partially differentiate w.r.t γ and β, respectively.
Since it is difficult to find a closed-form solution for the set of nonlinear Eqs. (13,14) with unknown gamma and beta values, the above-described nonlinear system may be numerically solved using an iterative method like Newton-Raphson by 'maxLik' package in R software.

Bayesian estimation
Since random and parameter uncertainty are expressed by a prior joint distribution that was generated before the data was obtained on the failure, the Bayesian approach deals with parameters. The flexibility of the Bayesian technique to incorporate previous knowledge into research makes it particularly useful in the study of reliability, as the lack of data is one of the major problems with reliability analysis. The γ and β parameters of DMOLBED take prior gamma distributions, where γ and β are non-negative values. The α and b parameters as independent joint prior density functions can be expressed as follows: The estimates and their variances were equated with the inverse of the Fisher information matrix of alpha and beta to produce the ML estimator for γ and β , which was contributed by Dey et al. 23 . This procedure was used to extract the hyper-parameters of the informative priors. The joint posterior density function of γ and β are derived from likelihood function of DMOLBED and joint prior density:

Results and discussion
In this section, the results from the Monte Carlo simulation and real-life application are discussed in detail. All numerical calculations performed using R language software.
Simulation study.
1. The following simulation research is carried out to examine the behaviour of Bayesian and maximum likelihood estimates of the DMOLBE distribution. The simulation research is conducted using the below procedures. 2. Generate N = 10, 000 samples of size n = 50, 100, 150, 200, and 300 from DMOLBD. 3. Estimate the parameters γ and β from each generated sample. 4. Compute the absolute biases (AB) and mean square errors (MSE) using the following equations.
For MLE:

For Bayesian
The simulation results are reported in Tables 2 and 3. Following conclusions are obtained from the results.
The following points were concluded from the simulation results 1. The estimated bias always decreases and approaches zero when n → ∞ for all combinations of parameters. 2. The estimated MSE decrease with an increase in sample size.  The model parameters of considered models are estimated using the maximum likelihood method. The performance of all fitted distributions is compared utilizing some criteria, Akaike information criterion (AIC), Bayesian information criterion (BIC), and Kolmogorov-Smirnov (K-S) test with its corresponding p values. All the computations are carried out in R software.
Data Set I (death due to coronavirus in China). The first data set is the number of deaths due to coronavirus in China from 23 January to 28 March. The data sets used in the paper was collected from 2020 year. The data set is reported in https:// www. world omete rs. info/ coron avirus/ count ry/ china/. The data are: 8, 16, 15, 24, 26, 26, 38, 43, 46, 45, 57, 64, 65, 73, 73, 86, 89, 97, 108, 97, 146, 121, 143, 142, 105, 98, 136, 114, 118, 109, 97, 150, 71, 52, 29,  44, 47, 35, 42, 31, 38, 31, 30, 28, 27, 22, 17, 22, 11, 7, 13, 10, 14, 13, 11, 8, 3, 7, 6, 9, 7, 4, 6, 5, 3 and 5. The MLEs with their corresponding standard errors and goodness-of-fit measures are presented in Table 4. Table 4 presents the results for estimated parameters using different models for the first data set which shows that DMOLBE distribution better fits the data set as compared to other competitive models as AIC and BIC are smaller for the proposed model. Table 5 discussed comparing between MLE and Bayesian estimation by SE for www.nature.com/scientificreports/ the death due to coronavirus in China. By results in Table 5, we conclude that the Bayesian estimation is best estimation method for the death due to coronavirus in China. Figure 6 shows the cdf of different distributions of the first data set and Fig. 7 presents the P-P plots for all the competitive models, both figure supports the results obtained in Table 4. Figure 8 show that estimates of DMOLBED parameters for the death due to coronavirus in China data is existence and has the maximum log-likelihood value. Figure 9 plot MCMC plot results www.nature.com/scientificreports/ of parameter estimates of DMOLBED for the death due to coronavirus in China data to confirm the estimates have convergence and the posterior has normal distribution as proposed distribution.
Data Set II (daily death due to coronavirus in Pakistan). The second data set is the daily deaths due to coronavirus in Pakistan from 18 March to 30 June. The data sets used in the paper was collected from 2020 year. The data is reported in https:// www. world omete rs. info/ coron avirus/ count ry/ Pakis tan. The data are: 1,6,6,4,4,4,1,20,5,2,3,15,17,7,8,25,8,25,11,25,16,16,12,11,20,31,42,32,23,17,19,38,50,21,14,37,23,47,31,24,9,64,39 Table 6 presents the results for estimated parameters using different models of the second data set which shows that DMOLBE distribution better fits the data set as compared to other competitive models as AIC and BIC are smaller for the proposed model. Table 7 discussed comparing between MLE and Bayesian estimation by SE. By results in Table 7, we conclude that the Bayesian estimation is best estimation method. Figure 10 shows the cdf of different distributions of the second data set and Fig. 11 presents the P-P plots for all the competitive models, both figure supports the results obtained in Table 6. Figure 12 show that estimates of DMOLBED parameters for Coronavirus in Pakistan data is existence and has the maximum log-likelihood value. Figure 13 plot MCMC

Conclusion
The DMOLBE distribution, a novel two-parameter discrete probability distribution that may be utilised in place of well-known distributions, is introduced in this study. Its mathematical characteristics are provided in some cases. The maximum likelihood and Bayesian estimation methods are used to estimate the distribution's parameters. The MCMC method is applied by the MH algorithm to produce the Bayesian estimation method.
To evaluate the performance of unidentified parameters based on AB and MSE, simulation research is conducted. MLE and Bayesian estimate methods for the performance parameter of the DMOLBE distribution were compared through simulation. We came to the conclusion that the Bayesian estimation approach is superior for   www.nature.com/scientificreports/ estimating DMOLBE distribution parameter. The flexibility of the model is proved by using two real data sets and is compared with different existing models and the proposed model perform better among other models. Further the estimation of the proposed model can be performed using transforms. We will make future work as extension for this study, we will make a regression analysis to predict the future mortality rates in many countries under considerations.  www.nature.com/scientificreports/

Future work
Future work in statistical analysis for COVID-19 data holds great potential in advancing our understanding of the pandemic and informing evidence-based decision-making. One key area of focus is the integration of more comprehensive and diverse datasets, including demographic, socioeconomic, and healthcare variables, to explore the multifaceted aspects of COVID-19's impact on different populations. Advanced machine learning techniques can be applied to identify complex relationships and risk factors associated with the spread, severity, and outcomes of the virus. Furthermore, predictive modeling can be enhanced by incorporating real-time data streams and dynamic factors to provide more accurate and timely forecasts, aiding in proactive planning and resource allocation. Longitudinal studies analyzing the long-term effects of the pandemic and assessing the efficacy of interventions over time will provide valuable insights into the sustainability of public health measures. Additionally, ethical considerations and privacy-preserving methodologies should be integrated into future analyses to ensure data security and protect individuals' rights. Overall, future work in statistical analysis for COVID-19 data will continue to play a pivotal role in guiding public health policies, bolstering preparedness for future outbreaks, and ultimately safeguarding global health.

Data availability
All data exists in the paper with all its references.